Basketball: how likely is it to score?

6.6 μs
14.4 ms

When playing basketball we can ask ourself: how likely is it to score given a position in the court? To answer this question we are going to use data from NBA games from the season 2006 - 2007. We will consider all types of shots.

7.4 μs
39 s

6 rows × 7 columns

resultxyperiodtimedistanceangle
Int64Int64Float64Int64Time…Float64Float64
10-1416.75111:42:0021.8303-0.874592
21-928.75111:22:0030.1258-1.26742
30-1618.75111:07:0024.6488-0.86437
40-62.75110:47:006.60019-0.429762
50-1312.75110:34:0018.2089-0.77569
61-1919.75110:27:0027.4055-0.804751
535 ms

But how we interpret the data?

In the sketch below we show the drawing of a basketball court, its dimensions and how to interpret the data in the table.

13.1 μs
9.7 s

So, the x and y axis have their origin at the hoop, and we compute the distance from this point to where the shot was made. Also, we compute the angle with respect to the x axis, showed as θ in the sketch. In the data we have the period, which can take values from 1 to 4, meaning the period in which the shot was made.

11.7 μs

We now plot where the shots where made:

7.5 μs
20 s

We see that the shots are very uniformly distributed near the hoop, except for distances very near to the hoop, to see this better, we plot the histograms for each axis, x and y.

12.8 μs

But we are interested in the shots that were scored, so we filter now the shots made and plot the histogram of each axis.

3.6 μs
248 ms
1.1 s
198 ms

If we plot a 3d plot of the count we obtain the plot wireplot shown below.

3.7 μs
1.9 s

We see that more shot are made as we get near the hoop, as expected.

It is important to notice that we are not showing the probability of scoring, we are just showing the distribution of shot scored, not how likely is it to score.

4.4 μs

Modeling the probability of scoring

8.9 μs

The first model we are going to propose is a Bernoulli model.

Why a Bernoulli Distribution?

A Bernoulli Distribution results from an experiment in which we have 2 possible outcomes, one that we usually called a success and another called a fail. In our case our success is scoring the shot and the other possible event is failing it.

The only parameter needed in a bernoulli distribution is the probability p of having a success. We are going to model this parameter as a logistic function:

15 μs
13.5 s
1.6 s

Why a logistic function?

We are going to model the probability of shoot as a function of some variables, for example the distance to the hoop, and we want that our probability of scoring increases as we are getting closer to it. Also out probability needs to be between 0 an 1, so a nice function to map our values is the logistic function.

6.6 μs

So, the model we are going to propose is:

plogistic(a+bdistance[i]+cangle[i])

outcome[i]Bernoulli(p)

10.7 μs

But what values and prior distributions are we going to propose to the parameters a, b and c?

Let's see:

10.9 μs

Prior Predictive Checks: Part I

3.6 μs

Suppose we say that our prior distributions for a, b and c are going to be 3 gaussian distributions with mean 0 and variance 1. Lets sample and see what are the possible posterior distributions for our probability of scoring p:

aN(0,1)

bN(0,1)

cN(0,1)

7.3 μs
156 ms
369 ms

We see that some of the predicted behaviours for p don't make sense. For example, if b takes positive values, we are saying that as we increase our distance from the hoop, the probability of scoring also increase. So we propose instead the parameter b to be the negative values of a LogNormal distribution. The predicted values for p are shown below.

15.5 μs

So our model now have as priors distributions:

aNormal(0,1)

bLogNormal(1,0.25)

cNormal(0,1)

6.4 μs

and sampling values from those prior distributions, we obtain the plot shown below for the predicted values of p.

8.4 μs
26.1 ms
74.8 ms

Now that we have the expected behaviour for p, we define our model and calculate the posterior distributions with our data points.

6.5 μs

Defining our model and computing posteriors

6.8 μs

Now we define our model to sample from it:

6.5 μs
logistic_regression (generic function with 1 method)
96.3 μs
1.1 μs

The output of the sampling tell us also some information about sampled values for our parameters, like the mean, the standard deviation and some other computations.

3.4 μs
chain
Chains MCMC chain (1500×12×3 Array{Float64,3}):

Iterations        = 1:1500
Thinning interval = 1
Chains            = 1, 2, 3
Samples per chain = 1500
parameters        = a, b, c
internals         = acceptance_rate, hamiltonian_energy, hamiltonian_energy_error, is_accept, log_density, lp, n_steps, nom_step_size, step_size

Summary Statistics
  parameters      mean       std   naive_se      mcse          ess      rhat  
      Symbol   Float64   Float64    Float64   Float64      Float64   Float64  
                                                                              
           a    0.1496    0.1223     0.0018    0.0016    5777.9240    0.9997  
           b    1.4944    0.2412     0.0036    0.0027   12842.2476    0.9998  
           c   -0.0251    0.1087     0.0016    0.0054     369.8630    1.0068  

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5%  
      Symbol   Float64   Float64   Float64   Float64   Float64  
                                                                
           a   -0.0613    0.0720    0.1479    0.2264    0.3696  
           b    1.0605    1.3317    1.4860    1.6475    1.9873  
           c   -0.1760   -0.0745   -0.0229    0.0260    0.1219  
75.8 s

Traceplot

In the plot below we show a traceplot of the sampling.

What is a traceplot?

When we run a model and calculate the posterior, we obtain sampled values from the posterior distributions. We can tell our sampler how many sampled values we want. A traceplot is just showing them in sequential order. We also can plot the distribution of those values, and this is what is showed next to each traceplot.

8.6 μs
4.3 s
106 ms

Now plotting the probability of scoring using the posterior distributions of a, b and c for an angle of 45°, we obtain:

6.3 μs
531 ms
632 ms

The plot shows that the probability of scoring is higher as our distance to the hoop decrease, which makes sense, since the difficulty of scoring increase.

We plot now how the probability varies with the angle for a given distance. Here we plot for a mid distance, corresponding to 0.5 in a normalized distance.

7.4 μs
596 ms
1 s

We see that the model predict an almost constant probability for the angle.

10.5 μs

New model and prior predictive checks: Part II

4.6 μs

Now we propose another model with the form:

plogistic(a+bdistance[i]+cangle[i])

*But for what values of b the model makes sense?

We show below the plot for 4 function with 4 possible values of b, having in mind that the values of x, the normalized distance, goes from 0 to 1.

8.3 μs
41.4 μs
980 ms

Analysing the possible values for b, the one that makes sense is the value proposed in f1, since we want an increasing influence of the distance in the values of p as the distance decreases, since the logistic function has higher values for higher values of x.

6.3 μs

So now that we know the values the our parameter b can take, we propose for it a beta distribution with parameters α=2 and β=5, showed in the plot below.

10.2 μs
783 ms

Defining the new model and computing posteriors

5.4 μs

We define then our model and calculate the posterior as before.

3.8 μs
logistic_regression_exp (generic function with 1 method)
100 μs
chain_exp
Chains MCMC chain (1500×12×3 Array{Float64,3}):

Iterations        = 1:1500
Thinning interval = 1
Chains            = 1, 2, 3
Samples per chain = 1500
parameters        = a, b, c
internals         = acceptance_rate, hamiltonian_energy, hamiltonian_energy_error, is_accept, log_density, lp, n_steps, nom_step_size, step_size

Summary Statistics
  parameters      mean       std   naive_se      mcse        ess      rhat  
      Symbol   Float64   Float64    Float64   Float64    Float64   Float64  
                                                                            
           a   -0.9647    0.1578     0.0024    0.0105   142.9505    1.0332  
           b    0.2192    0.1453     0.0022    0.0097   125.0977    1.0401  
           c   -0.0051    0.0934     0.0014    0.0045   510.1481    1.0064  

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5%  
      Symbol   Float64   Float64   Float64   Float64   Float64  
                                                                
           a   -1.2481   -1.0744   -0.9721   -0.8609   -0.6392  
           b    0.0120    0.1045    0.1971    0.3116    0.5522  
           c   -0.1502   -0.0490   -0.0003    0.0472    0.1343  
65.8 s

Plotting the traceplot we see again that the variable angle has little importance since the parameter c, that can be related to the importance of the angle variable for the probability of scoring, is centered at 0.

6.8 μs
33.7 ms
804 ms

Employing the posteriors distributions computed, we plot the probability of scoring as function of the normalized distance and obtain the plot shown below.

3.4 μs
495 ms

Given that we have 2 variables, we can plot the mean probability of scoring as function of the two and obtain a surface plot. We show this below.

8.1 μs
564 ms
im3
2.7 s

The plot show the behaviour expected, an increasing probability of scoring as we get near the hoop. We also see that there is almost no variation of the probability with the angle.

5.4 μs

Does the Period affect the probability of scoring?

3.2 μs

Now we will try to answer this question. We propose then a model, and calculate the posterior for its parameters with data of one of each of the four possible periods. We define the same model for all four periods. Also, we don't take into account now the angle variable, since we have seen before that this variable is of little importance.

We filter then our data by its period and proceed to estimate our posterior distributions.

4.4 μs
121 ms
logistic_regression_period (generic function with 1 method)
121 μs
n_
500
1.3 μs
chain_period1
Chains MCMC chain (1500×11×3 Array{Float64,3}):

Iterations        = 1:1500
Thinning interval = 1
Chains            = 1, 2, 3
Samples per chain = 1500
parameters        = a, b
internals         = acceptance_rate, hamiltonian_energy, hamiltonian_energy_error, is_accept, log_density, lp, n_steps, nom_step_size, step_size

Summary Statistics
  parameters      mean       std   naive_se      mcse        ess      rhat  
      Symbol   Float64   Float64    Float64   Float64    Float64   Float64  
                                                                            
           a   -0.9605    0.1559     0.0023    0.0077   367.5171    1.0055  
           b    0.2223    0.1407     0.0021    0.0078   297.9981    1.0078  

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5%  
      Symbol   Float64   Float64   Float64   Float64   Float64  
                                                                
           a   -1.2515   -1.0630   -0.9598   -0.8578   -0.6557  
           b    0.0255    0.1140    0.1971    0.3067    0.5520  
33.7 s
113 ms
29.9 s
102 ms
30.8 s
141 ms
29.5 s
224 ms

We plot now for each period the probability of scoring for each period, each mean and one standard deviation from it.

2.9 μs
353 ms

Finally, we see that for the periods 1 and 4, the first and the last periods, the probabity of scoring is slightly higher than the other two periods, meaning that players are somewhat better scoring in those periods.

8.3 μs